Hands-on Exercise 10a: Processing and Visualising Flow Data

Author

Christover Manafe

Published

October 22, 2024

Modified

October 22, 2024

1 Overview

Spatial interaction refers to the flow of people, materials, or information between locations in geographical space. It encompasses everything from freight shipments, energy flows, and global trade in rare antiquities, to flight schedules, rush hour congestion, and pedestrian foot traffic.

Each spatial interaction, as an analogy for a set of movements, consists of a discrete origin/destination pair. Each pair can be represented as a cell in a matrix where the rows correspond to the locations (centroids) of origin and the columns correspond to the locations (centroids) of destination. This type of matrix is commonly known as an origin/destination (OD) matrix, or a spatial interaction matrix.

In this hands-on exercise, we will learn how to build an OD matrix using the Passenger Volume by Origin Destination Bus Stops dataset downloaded from LTA DataMall. By the end of this exercise, we will be able to:

  • Import and extract OD data for a selected time interval,

  • Import and save geospatial data (i.e., bus stops and mpsz) into sf tibble data frame objects,

  • Populate planning subzone codes into the bus stops sf tibble data frame,

  • Construct desire lines geospatial data from the OD data, and

  • Visualize passenger volume by origin and destination bus stops using the desire lines data.

2 The Packages

We will use following packages in this exercise

We will use following packages in this exercise:

Package Description
sf Provides functions to manage, process, and manipulate Simple Features, a formal geospatial data standard that specifies a storage and access model of spatial geometries such as points, lines, and polygons.
tidyverse A collection of R packages for data science tasks such as importing, tidying, wrangling, and visualizing data.
tmap Provides functions for creating cartographic-quality static maps or interactive maps using the leaflet API.
DT Provides an R interface to the JavaScript library DataTables.
stplanr Provides functions for solving common problems in transport planning and modelling.

To install and launch all R packages.

pacman::p_load(tmap, sf, DT, stplanr, tidyverse)

3 Preparing the Flow Data

3.1 Downloading the OD data

First, we would need to download the OD data from LTA DataMall. We can follow these steps to do so.

Step Description
1 Request API Access from LTA DataMall website and complete the request form.
2 Install Postman and follow instructions from API User Guide.
3 Search for Passenger Volume by Origin Destination Bus Stops in API User Guide, then use the URL from the documentation into Postman
4

Set Http request in Postman to GET, then add following parameters:

  • Under Params tab:

    • Key: Date

    • Value: 202407

      The data is updated monthly by the 10th, with passenger volumes for the previous month. Since it’s currently October, the most recent available data covers July, August, and September. For this analysis, I am using the data from July.

  • Under Headers tab:

    • Key: AccountKey

    • Value: Use the API Account Key that is sent to our email.

5

Click Send button on the postman, and the API will return a URL in the response that can be used to download the file.

The URL will remain active for 5 minutes.

3.2 Importing the OD data

Next, we will import the downloaded Passenger Volume by Origin Destination Bus Stops dataset using the read_csv() function from the readr package.

odbus <- read_csv("data/aspatial/origin_destination_bus_202407.csv")
Rows: 5616988 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): YEAR_MONTH, DAY_TYPE, PT_TYPE, ORIGIN_PT_CODE, DESTINATION_PT_CODE
dbl (2): TIME_PER_HOUR, TOTAL_TRIPS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(odbus)
Rows: 5,616,988
Columns: 7
$ YEAR_MONTH          <chr> "2024-07", "2024-07", "2024-07", "2024-07", "2024-…
$ DAY_TYPE            <chr> "WEEKDAY", "WEEKDAY", "WEEKDAY", "WEEKDAY", "WEEKD…
$ TIME_PER_HOUR       <dbl> 15, 18, 15, 10, 15, 9, 6, 20, 16, 13, 16, 9, 7, 18…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "07539", "17041", "17009", "75489", "01329", "6012…
$ DESTINATION_PT_CODE <chr> "07518", "17251", "17061", "75009", "04167", "5900…
$ TOTAL_TRIPS         <dbl> 98, 779, 1141, 577, 90, 161, 266, 43, 278, 20, 165…

A quick check of the odbus tibble data frame shows that the values in ORIGIN_PT_CODE and DESTINATION_PT_CODE are already in character data type. However, we will still convert these values into factors using the code chunk below, as it helps categorize the data for further analysis.

odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 

3.3 Extracting the study data

For the purpose of this exercise, we will extract commuting flows on weekdays between 6:00 and 9:00 AM.

odbus6_9 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.

We can use the datatable package to create interactive tables for the odbus6_9 data frame:

datatable(odbus6_9)
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

We will save the output in RDS format for future use, and then re-import the saved RDS file into the R environment.

write_rds(odbus6_9, "data/rds/odbus6_9.rds")
odbus6_9 <- read_rds("data/rds/odbus6_9.rds")

4 Working with Geospatial Data

In this exercise, two geospatial datasets will be used:

  • BusStop: This dataset provides the locations of bus stops as of the last quarter of 2022.

  • MPSZ-2019: This dataset provides the sub-zone boundaries from the URA Master Plan 2019.

Both datasets are in ESRI shapefile format.

4.1 Importing geospatial data

We’ll import the geospatial datasets using the st_read() function.

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `/Users/cham/project/Geospatial-Analytics/chrismanafe/ISSS626-GAA/hands_on_ex/hands_on_ex10/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5166 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48285.52 ymax: 52983.82
Projected CRS: SVY21
mpsz <- st_read(dsn = "data/geospatial",
                   layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `/Users/cham/project/Geospatial-Analytics/chrismanafe/ISSS626-GAA/hands_on_ex/hands_on_ex10/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
mpsz
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...

4.2 Visualize Bus Stop Location

tm_shape(mpsz) +
  tm_polygons() +
tm_shape(busstop) +
  tm_dots(col="red") +
tm_layout(frame = F)

We noticed there are some bus stops that are located outside of Singapore.

5 Geospatial Data Wrangling

5.1 Combining BusStop and mpsz

busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C) %>%
  st_drop_geometry()
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
  • st_intersection() is used to perform point and polygon overly and the output will be in point sf object.
  • select() of dplyr package is then use to retain only BUS_STOP_N and SUBZONE_C in the busstop_mpsz sf data frame.
  • 5 bus stops is dropped because it is outside MPSZ boundary (i.e. in Malaysia).
datatable(busstop_mpsz)

Next, we will append the planning subzone code from the busstop_mpsz data frame onto the odbus6_9 data frame.

od_data <- left_join(odbus6_9 , busstop_mpsz,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_SZ = SUBZONE_C,
         DESTIN_BS = DESTINATION_PT_CODE)
Warning in left_join(odbus6_9, busstop_mpsz, by = c(ORIGIN_PT_CODE = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 16127 of `x` matches multiple rows in `y`.
ℹ Row 674 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Let’s check for duplicate records before moving forward with the analysis.

duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
duplicate
# A tibble: 1,484 × 4
   ORIGIN_BS DESTIN_BS TRIPS ORIGIN_SZ
   <chr>     <fct>     <dbl> <chr>    
 1 09047     02029        10 ORSZ02   
 2 09047     02029        10 ORSZ02   
 3 09047     02049        60 ORSZ02   
 4 09047     02049        60 ORSZ02   
 5 09047     02089        40 ORSZ02   
 6 09047     02089        40 ORSZ02   
 7 09047     02151       113 ORSZ02   
 8 09047     02151       113 ORSZ02   
 9 09047     02161        39 ORSZ02   
10 09047     02161        39 ORSZ02   
# ℹ 1,474 more rows

Because there are duplicate records, let’s run following code chunk to retain the unique records.

od_data <- unique(od_data)

Next, we will update od_data data frame with the planning subzone codes.

od_data <- left_join(od_data , busstop_mpsz,
            by = c("DESTIN_BS" = "BUS_STOP_N")) 
Warning in left_join(od_data, busstop_mpsz, by = c(DESTIN_BS = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 171 of `x` matches multiple rows in `y`.
ℹ Row 673 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Let’s check for duplicate records before moving forward with the analysis.

duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
duplicate
# A tibble: 1,864 × 5
   ORIGIN_BS DESTIN_BS TRIPS ORIGIN_SZ SUBZONE_C
   <chr>     <chr>     <dbl> <chr>     <chr>    
 1 01013     51071         8 RCSZ10    CCSZ01   
 2 01013     51071         8 RCSZ10    CCSZ01   
 3 01112     51071        77 RCSZ10    CCSZ01   
 4 01112     51071        77 RCSZ10    CCSZ01   
 5 01112     53041         6 RCSZ10    BSSZ01   
 6 01112     53041         6 RCSZ10    BSSZ01   
 7 01113     51071         6 DTSZ01    CCSZ01   
 8 01113     51071         6 DTSZ01    CCSZ01   
 9 01113     82221         1 DTSZ01    GLSZ05   
10 01113     82221         1 DTSZ01    GLSZ05   
# ℹ 1,854 more rows

Here we also got duplicate records, let’s run following code chunk to retain the unique records.

od_data <- unique(od_data)
od_data <- od_data %>%
  rename(DESTIN_SZ = SUBZONE_C) %>%
  drop_na() %>%
  group_by(ORIGIN_SZ, DESTIN_SZ) %>%
  summarise(MORNING_PEAK = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_SZ'. You can override using the
`.groups` argument.

Let’s save the output into an rds file format

write_rds(od_data, "data/rds/od_data.rds")
od_data <- read_rds("data/rds/od_data.rds")

6 Visualising Spatial Interaction

In this section, we will learn how to prepare a desire line by using stplanr package.

6.1 Removing intra-zonal flows

We will not plot the intra-zonal flows. The code chunk below will be used to remove intra-zonal flows.

od_data_fij <- od_data[od_data$ORIGIN_SZ!=od_data$DESTIN_SZ,]
write_rds(od_data_fij, "data/rds/od_data_fij.rds")
od_data_fij <- read_rds("data/rds/od_data_fij.rds")

6.2 Creating desire lines

In this code chunk below, od2line() of stplanr package is used to create the desire lines.

flowLine <- od2line(flow = od_data_fij, 
                    zones = mpsz,
                    zone_code = "SUBZONE_C")
Creating centroids representing desire line start and end points.
write_rds(flowLine, "data/rds/flowLine.rds")
flowLine <- read_rds("data/rds/flowLine.rds")

6.3 Visualising the desire lines

To visualise the resulting desire lines, let’s run the code chunk below.

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 1)
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

When the flow data are very messy and highly skewed like the one shown above, it is wiser to focus on selected flows, for example flow greater than or equal to 5000 as shown below.

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
  filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 1)
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

7 Reference

Kam, T. S. Processing and Visualising Flow Data. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap15.html